On integral generalized policy iteration for continuous-time linear quadratic regulations

نویسندگان

  • Jae Young Lee
  • Jin Bae Park
  • Yoon Ho Choi
چکیده

This paper mathematically analyzes the integral generalized policy iteration (I-GPI) algorithms applied to a class of continuous-time linear quadratic regulation (LQR) problems with the unknown system matrix A. GPI is the general idea of interacting policy evaluation and policy improvement steps of policy iteration (PI), for computing the optimal policy. We first introduce the update horizon }, and then show that i) all of the I-GPI methods with the same } can be considered equivalent and that ii) the value function approximated in the policy evaluation step monotonically converges to the exact one as }→∞. This reveals the relation between the computational complexity and the update (or time) horizon of I-GPI as well as between I-PI and I-GPI in the limit }→ ∞. We also provide and discuss two modes of convergence of I-GPI; I-GPI behaves like PI in one mode, and in the other mode, it performs like value iteration for discrete-time LQR and infinitesimal GPI (}→ 0). From these results, a new classification of the integral reinforcement learning is formed with respect to }. Two matrix inequality conditions for stability, the region of local monotone convergence, and data-driven (adaptive) implementation methods are also provided with detailed discussion. Numerical simulations are carried out for verification and further investigations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems

This paper proposes an integral Q-learning for continuous-time (CT) linear time-invariant (LTI) systems, which solves a linear quadratic regulation (LQR) problem in real time for a given system and a value function, without knowledge about the system dynamics A and B. Here, Q-learning is referred to as a family of reinforcement learning methods which find the optimal policy by interaction with ...

متن کامل

Stochastic differential equations with fractal noise

Stochastic differential equations in R with random coefficients are considered where one continuous driving process admits a generalized quadratic variation process. The latter and the other driving processes are assumed to possess sample paths in the fractional Sobolev space W β 2 for some β > 1/2. The stochastic integrals are determined as anticipating forward integrals. A pathwise solution p...

متن کامل

A Novel Generalized Value Iteration Scheme For Partially-Unknown Continuous-Time Linear Systems

In this paper, a novel generalized value iteration technique is presented for solving online the discounted linear quadratic (LQ) optimal control problems for continuous-time (CT) linear systems with an unknown system matrix A. In the proposed method, the discounted value function is considered, which is a general setting in reinforcement learning (RL) frameworks, but not fully considered in RL...

متن کامل

A New Inexact Inverse Subspace Iteration for Generalized Eigenvalue Problems

In this paper, we represent an inexact inverse subspace iteration method for computing a few eigenpairs of the generalized eigenvalue problem Ax = Bx [Q. Ye and P. Zhang, Inexact inverse subspace iteration for generalized eigenvalue problems, Linear Algebra and its Application, 434 (2011) 1697-1715 ]. In particular, the linear convergence property of the inverse subspace iteration is preserved.

متن کامل

Adaptive linear quadratic control using policy iteration - American Control Conference, 1994

In this paper we present stability and convergence results for Dynamic Programming-based reinforcement learning applied to Linear Quadratic Regulation (LQR). The specific algorithm we analyze is based on Q-learning and it is proven to converge to the optimal controller provided that the underlying system is controllable and a particular signal vector is persistently excited. This is the first c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Automatica

دوره 50  شماره 

صفحات  -

تاریخ انتشار 2014